NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Performance Characterization of HTAP Workloads

https://doi.org/10.1109/ICDE51399.2021.00162

Sirin, Utku; Dwarkadas, Sandhya; Ailamaki, Anastasia (April 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE))
null (Ed.)
Hybrid Transactional and Analytical Processing (HTAP) systems have become popular in the past decade. HTAP systems allow running transactional and analytical processing workloads on the same data and hardware. As a result, they suffer from workload interference. Despite the large body of existing work in HTAP systems and architectures, none of the existing work has systematically analyzed workload interference for HTAP systems. In this work, we characterize workload interference for HTAP systems. We show that the OLTP throughput drops by up to 42% due to sharing the hardware resources. Partitioning the last-level cache (LLC) among the OLTP and OLAP workloads can significantly improve the OLTP throughput without hurting the OLAP throughput. The OLAP throughput is significantly reduced due to sharing the data. The OLAP execution time is exponentially increased if the OLTP workload generates fresh tuples faster than the HTAP system propagates them. Therefore, in order to minimize the workload interference, HTAP systems should isolate the OLTP and OLAP workloads in the shared hardware resources and should allocate enough resources to fresh tuple propagation to propagate the fresh tuples faster than they are generated.
more » « less
Full Text Available
Workload Interference Analysis for HTAP

Sirin, Utku; Dwarkadas, Sandhya; Ailamaki, Anastasia (January 2021, gong show at the Conference on Innovative Data Systems Research (CIDR) 2021)
null (Ed.)
Hybrid Transactional and Analytical Processing (HTAP) systems suffer from workload interference at the software and hardware level. We examine workload interference for HTAP systems and highlight investigation directions to mitigate the interference. We use the popular two-copy HTAP architecture. The OLTP and OLAP sides are independent components with their own private copies of the data. The OLTP side is a row-store, whereas the OLAP side is a column-store. The OLTP and OLAP sides are connected by means of an intermediate data structure, delta, that keeps track of the fresh tuples that are generated by the OLTP side, but not yet transferred to the OLAP side. OLTP transactions register their modifications to delta before committing. OLAP queries first prop- agate fresh tuples from the OLTP side to the OLAP side and then perform query execution over the data at the OLAP side. HTAP systems suffer from interference at both the software and hardware level. Software-level interference depends on the OLTP and fresh tuple propagation throughput. In order to minimize interference, HTAP systems should ensure that fresh tuple propagation throughput is greater than the throughput of the OLTP transactions that generate the fresh tuples. Hardware-level interference depends on the demand for shared resources such as LLC and memory bandwidth by the OLTP and OLAP workloads. HTAP systems should isolate the OLTP and OLAP workloads in the shared resources and use micro-architectural re- source allocation policies that assign the optimal amount of re- sources to OLTP and OLAP workloads to minimize hardware-level interference.
more » « less
Full Text Available
Adaptive partitioning and indexing for in situ query processing

https://doi.org/10.1007/s00778-019-00580-x

Olma, Matthaios; Karpathiotakis, Manos; Alagiannis, Ioannis; Athanassoulis, Manos; Ailamaki, Anastasia (January 2020, The VLDB journal)

The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. To alleviate the loading cost, in situ query processing systems operate directly over raw data and offer instant access to data. At the same time, analytical workloads have increasing number of queries. Typically, each query focuses on a constantly shifting—yet small—range. As a result, minimizing the workload latency requires the benefits of indexing in in situ query processing. In this paper, we present an online partitioning and indexing scheme, along with a partitioning and indexing tuner tailored for in situ querying engines. The proposed system design improves query execution time by taking into account user query patterns, to (i) partition raw data files logically and (ii) build lightweight partition-specific indexes for each partition. We build an in situ query engine called Slalom to showcase the impact of our design. Slalom employs adaptive partitioning and builds non-obtrusive indexes in different partitions on-the-fly based on lightweight query access pattern monitoring. As a result of its lightweight nature, Slalom achieves efficient query processing over raw data with minimal memory consumption. Our experimentation with both microbenchmarks and real-life workloads shows that Slalom outperforms state-of-the-art in situ engines and achieves comparable query response times with fully indexed DBMS, offering lower cumulative query execution times for query workloads with increasing size and unpredictable access patterns.
more » « less
Full Text Available
GPU rasterization for real-time spatial aggregation over arbitrary polygons

https://doi.org/10.14778/3157794.3157803

Zacharatou, Eleni Tzirita; Doraiswamy, Harish; Ailamaki, Anastasia; Silva, Cláudio T.; Freiref, Juliana (November 2017, Proceedings of the VLDB Endowment)

Full Text Available
Interactive Visual Exploration of Spatio-Temporal Urban Data Sets using Urbane

https://doi.org/10.1145/3183713.3193559

Doraiswamy, Harish; Tzirita Zacharatou, Eleni; Miranda, Fabio; Lage, Marcos; Ailamaki, Anastasia; Silva, Cláudio T.; Freire, Juliana (January 2018, SIGMOD '18)

The recent explosion in the number and size of spatio-temporal data sets from urban environments and social sensors creates new opportunities for data-driven approaches to understand and improve cities. Visual analytics systems like Urbane aim to empower domain experts to explore multiple data sets, at different time and space resolutions. Since these systems rely on computationally-intensive spatial aggregation queries that slice and summarize the data over different regions, an important challenge is how to attain interactivity. While traditional pre-aggregation approaches support interactive exploration, they are unsuitable in this setting because they do not support ad-hoc query constraints or polygons of arbitrary shapes. To address this limitation, we have recently proposed Raster Join, an approach that converts a spatial aggregation query into a set of drawing operations on a canvas and leverages the rendering pipeline of the graphics hardware (GPU). By doing so, Raster Join evaluates queries on the fly at interactive speeds on commodity laptops and desktops. In this demonstration, we showcase the efficiency of Raster Join by integrating it with Urbane and enabling interactivity. Demo visitors will interact with Urbane to filter and visualize several urban data sets over multiple resolutions.
more » « less
Full Text Available
The Beckman Report on Database Research

https://doi.org/10.1145/2694428.2694441

Abadi, Daniel; Agrawal, Rakesh; Ailamaki, Anastasia; Balazinska, Magdalena; Bernstein, Philip A.; Carey, Michael J.; Chaudhuri, Surajit; Dean, Jeffrey; Doan, AnHai; Franklin, Michael J.; et al (December 2014, ACM SIGMOD Record)
null (Ed.)
Every few years a group of database researchers meets to discuss the state of database research, its impact on practice, and important new directions. This report summarizes the discussion and conclusions of the eighth such meeting, held October 14- 15, 2013 in Irvine, California. It observes that Big Data has now become a defining challenge of our time, and that the database research community is uniquely positioned to address it, with enormous opportunities to make transformative impact. To do so, the report recommends significantly more attention to five research areas: scalable big/fast data infrastructures; coping with diversity in the data management landscape; end-to-end processing and understanding of data; cloud services; and managing the diverse roles of people in the data life cycle.
more » « less
Full Text Available

Search for: All records